The mRMR variable selection method: a comparative study for functional data
نویسندگان
چکیده
The use of variable selection methods is particularly appealing in statistical problems with functional data. The obvious general criterion for variable selection is to choose the ‘most representative’ or ‘most relevant’ variables. However, it is also clear that a purely relevanceoriented criterion could lead to select many redundant variables. The mRMR (minimum Redundance Maximum Relevance) procedure, proposed by Ding and Peng (2005) and Peng et al. (2005) is an algorithm to systematically perform variable selection, achieving a reasonable trade-off between relevance and redundancy. In its original form, this procedure is based on the use of the so-called mutual information criterion to assess relevance and redundancy. Keeping the focus on functional data problems, we propose here a modified version of the mRMR method, obtained by replacing the mutual information by the new association measure (called distance correlation) suggested by Székely et al. (2007). We have also performed an extensive simulation study, including 1600 functional experiments (100 functional models × 4 sample sizes × 4 classifiers) and three real-data examples aimed at comparing the different versions of the mRMR methodology. The results are quite conclusive in favor of the new proposed alternative.
منابع مشابه
A New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملUsing covariates for improving the minimum Redundancy Maximum Relevance feature selection method
Maximizing the joint dependency with a minimum size of variables is generally the main task of feature selection. For obtaining a minimal subset, while trying to maximize the joint dependency with the target variable, the redundancy among selected variables must be reduced to a minimum. In this paper, we propose a method based on recently popular minimum Redundancy-Maximum Relevance (mRMR) crit...
متن کاملMrmr Ba: a Hybrid Gene Selection Algorithm for Cancer Classification
The microarray technology facilitates biologist in monitoring the activity of thousands of genes (features) in one experiment. This technology generates gene expression data, which are significantly applicable for cancer classification. However, gene expression data consider as highdimensional data which consists of irrelevant, redundant, and noisy genes that are unnecessary from the classifica...
متن کاملA Robust Supervised Variable Selection for Noisy High-Dimensional Data
The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while contr...
متن کاملFeature Selection For Genomic Data By Combining Filter And Wrapper Approaches
Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. In this paper, we propose a two-stage selection algorithm for genomic data by combining MRMR (Minimum Redundancy Maximum Relevance) and GA (Genetic Algorithm): In the ...
متن کامل